Raw accelerometry data collected during outdoor walking, stair climbing, and driving for 32 healthy adults.
The Methods description provided below is sourced from Reference [1] (Section: “2 Data Collection”) except from one note referenced elsewhere.
Database files contain:
raw_accelerometry_data - a directory with 32 data files in CSV format. Each file corresponds to raw accelerometry data measurements of 1 study participant. File names follow the convention: "subj_id.csv". Each file contains 14 variables (column name in italics):raw_accelerometry_data_dict.csv - a CSV file containing the description of 14 variables that each file in the raw_accelerometry_data directory consists of.
participant_demog.csv - a CSV file with participants demographic information. The file contains 7 variables (column name in italics):
All data are anonymized.
(Publishing in progress) PhysioNet - platform that offers free access to large collections of physiological and clinical data and related open-source software. View on PhysioNet:
tmT6aiXgBz07VUnF1qIg (Available today) Personal Dropbox. Exaclty the same data and data structure as we submitted to PhysioNet, just zipped in one directory. See below instructions for downloading.
library(data.table)
library(magrittr)
library(reshape2)
library(ggplot2)
library(latex2exp)
library(raster)
library(dplyr)
select <- dplyr::select; filter <- dplyr::filter; summarize <- dplyr::summarize
mutate <- dplyr::mutate
wd_dat.dir <- file.path(wd, "data")
acc_dat.dir <- file.path(wd_dat.dir, "IU_walking_driving_climbing")
acc_dat.zip <- file.path(wd_dat.dir, "IU_walking_driving_climbing.zip")
acc_dat.zip.url <- "https://www.dropbox.com/s/pf5l2ki9t2ae2df/IU_walking_driving_climbing.zip?dl=1"
## Make data dir if does not exist yet
if (!dir.exists(wd_dat.dir)){
dir.create(wd_dat.dir)
}
## If files not downloaded yet, download (182.6 MB)
if (!dir.exists(acc_dat.dir) & !file.exists(acc_dat.zip)){
## Download zip, unzip, remove zip
download.file(acc_dat.zip.url, acc_dat.zip)
unzip(acc_dat.zip, exdir = acc_dat.dir)
rm(acc_dat.zip)
}
Read demographic data of all 32 participants
fpath.demog <- file.path(acc_dat.dir, "participant_demog.csv")
df.demog <- as.data.frame(fread(fpath.demog))
head(df.demog)
## subj_id gender age height_in weight_lbs race right_handed
## 1 idabd0c53c male 23 72 165 caucasian 1
## 2 id5993bf4a female 45 69 141 caucasian 1
## 3 idd80ac2b4 female 29 73 154 caucasian 1
## 4 id82b9735c male 43 71 185 caucasian 1
## 5 id8af5374b male 47 75 238 caucasian 1
## 6 id650857ca male 34 70 140 caucasian 1
What unzipped IU data dir contains?
list.files(acc_dat.dir)
## [1] "participant_demog.csv" "raw_accelerometry_data"
## [3] "raw_accelerometry_data_dict.csv"
list.files(file.path(acc_dat.dir, "raw_accelerometry_data"))
## [1] "id00b70b13.csv" "id079c763c.csv" "id1165e00c.csv" "id1c7e64ad.csv" "id1f372081.csv"
## [6] "id34e056c8.csv" "id37a54bbf.csv" "id3e3e50c7.csv" "id4ea159a8.csv" "id5308a7d6.csv"
## [11] "id5993bf4a.csv" "id650857ca.csv" "id687ab496.csv" "id7c20ee7a.csv" "id82b9735c.csv"
## [16] "id86237981.csv" "id8af5374b.csv" "id8e66893c.csv" "id9603e9c3.csv" "ida61e8ddf.csv"
## [21] "idabd0c53c.csv" "idb221f542.csv" "idbae5a811.csv" "idc735fc09.csv" "idc91a49d0.csv"
## [26] "idd80ac2b4.csv" "idecc9265e.csv" "idf1ce9a0f.csv" "idf540d82b.csv" "idf5e3678b.csv"
## [31] "idfc5f05e4.csv" "idff99de96.csv"
Read raw accelerometry data of 1 study participant.
IU_dat.raw.fnames <- list.files(file.path(acc_dat.dir, "raw_accelerometry_data"))
fname.i <- IU_dat.raw.fnames[1]
fpath.i <- file.path(acc_dat.dir, "raw_accelerometry_data", fname.i)
df.i <- as.data.frame(fread(fpath.i))
head(df.i)
## activity time_s lw_x lw_y lw_z lh_x lh_y lh_z la_x la_y la_z ra_x ra_y ra_z
## 1 99 0.01 0.039 1.020 -0.020 -0.180 1.234 0.023 0.156 0.855 -0.582 0.887 0.449 0.074
## 2 99 0.02 -0.629 -0.461 0.973 -0.246 0.137 0.969 -0.707 0.559 0.449 -0.027 0.141 0.934
## 3 99 0.03 -0.926 -1.262 0.691 0.238 -0.328 1.219 -1.445 1.367 0.367 -0.164 0.660 1.344
## 4 99 0.04 -0.871 -1.496 -0.246 0.711 -0.484 0.414 -1.660 1.645 -0.543 -0.141 1.027 0.707
## 5 99 0.05 -0.727 -1.621 -0.559 1.031 -0.297 0.145 -1.762 1.676 -0.918 -0.121 1.230 0.438
## 6 99 0.06 -0.543 -1.664 -0.629 1.121 -0.246 0.137 -1.797 1.648 -0.988 -0.051 1.422 0.371
Map activity type (map values were learned from raw_accelerometry_data_dict.csv) file.
map_vec <- rep(NA, 99)
map_vec[c(1,2,3,4,77,99)] <- c("walking", "descending_stairs", "ascending_stairs",
"driving", "clapping", "non_study_activity")
df.i$activity_lab <- map_vec[df.i$activity]
table(df.i$activity_lab)
##
## ascending_stairs clapping descending_stairs driving non_study_activity
## 9125 3544 9766 128537 103888
## walking
## 48440
Duration time of accelerometry data collection.
## Frequency of data collection (sensor setting; number of observations per second)
hz <- 100
## How many minutes of data colleciton total?
round(nrow(df.i) / (hz * 60), 2)
## [1] 50.55
## How many minutes of data for each activity?
round(table(df.i$activity_lab) / (hz * 60), 2)
##
## ascending_stairs clapping descending_stairs driving non_study_activity
## 1.52 0.59 1.63 21.42 17.31
## walking
## 8.07
Reshape data.
df.lw <- df.i %>%
select(activity_lab, time_s, x = lw_x, y = lw_y, z = lw_z) %>%
mutate(loc_id = "left_wrist")
df.lh <- df.i %>%
select(activity_lab, time_s, x = lh_x, y = lh_y, z = lh_z) %>%
mutate(loc_id = "left_hip")
df.la <- df.i %>%
select(activity_lab, time_s, x = la_x, y = la_y, z = la_z) %>%
mutate(loc_id = "left_ankle")
df.ra <- df.i %>%
select(activity_lab, time_s, x = ra_x, y = ra_y, z = ra_z) %>%
mutate(loc_id = "right_ankle")
df.all <- rbind(df.lw, df.lh, df.la, df.ra)
rm(df.lw, df.lh, df.la, df.ra, df.i)
head(df.all)
## activity_lab time_s x y z loc_id
## 1 non_study_activity 0.01 0.039 1.020 -0.020 left_wrist
## 2 non_study_activity 0.02 -0.629 -0.461 0.973 left_wrist
## 3 non_study_activity 0.03 -0.926 -1.262 0.691 left_wrist
## 4 non_study_activity 0.04 -0.871 -1.496 -0.246 left_wrist
## 5 non_study_activity 0.05 -0.727 -1.621 -0.559 left_wrist
## 6 non_study_activity 0.06 -0.543 -1.664 -0.629 left_wrist
Plot 0-10 seconds of raw accelerometry data from walking, across four sensor locations.
loc_id.level <- c("left_wrist", "left_hip", "left_ankle", "right_ankle")
activity_lab.level <- c("non_study_activity", "driving", "clapping", "walking",
"ascending_stairs", "descending_stairs")
plt.df <-
df.all %>%
filter(activity_lab == "walking") %>%
group_by(loc_id) %>%
mutate(time_s_act = row_number(time_s) / 100) %>%
filter(time_s_act >= 0, time_s_act <= 10) %>%
select(-time_s, -activity_lab) %>%
reshape2::melt(id.vars = c("time_s_act", "loc_id")) %>%
mutate(loc_id = factor(loc_id, levels = loc_id.level))
ggplot(plt.df, aes(x = time_s_act, y = value, color = variable)) +
geom_line() +
facet_grid(loc_id ~ .) +
labs(color = "Sensor\naxis", y = TeX("Acceleration measurement \\[\\textit{g}\\]"), x = "Exercise time [s]")
Plot 0-5 seconds of raw accelerometry data, across four sensor locations, across various activities.
plt.df <-
df.all %>%
filter(activity_lab != "non_study_activity") %>%
group_by(loc_id, activity_lab) %>%
mutate(time_s_act = row_number(time_s) / 100) %>%
filter(time_s_act >= 0, time_s_act <= 5) %>%
select(-time_s) %>%
reshape2::melt(id.vars = c("time_s_act", "loc_id", "activity_lab")) %>%
mutate(loc_id = factor(loc_id, levels = loc_id.level),
activity_lab = factor(activity_lab, levels = activity_lab.level))
ggplot(plt.df, aes(x = time_s_act, y = value, color = variable)) +
geom_line() +
facet_grid(loc_id ~ activity_lab) +
labs(color = "Sensor\naxis", y = TeX("Acceleration measurement \\[\\textit{g}\\]"), x = "Exercise time [s]")
Image source: MathWorks documentation: https://www.mathworks.com/help/matlab/ref/cart2sph.html .
The mapping from three-dimensional Cartesian coordinates \((x,y,z)\) to spherical coordinates \((\text{azimuth},\text{elevation},r)\) is given by:
\[ \begin{aligned} \text{azimuth} & = {\displaystyle \operatorname {arctan2} (y,x)}, \\ \text{elevation} & = {\displaystyle \operatorname {arctan2} (z,\sqrt{x^2 + y^2})}, \\ r & = \sqrt{x^2 + y^2 + z^2}, \end{aligned} \] where \({\displaystyle \operatorname {arctan2}}\) is defined as the angle in the Euclidean plane, given in radians, between the positive \(x\)-axis and the ray to the point \((x,y) \neq (0,0)\).
Note:
vm (vector magnitude) column instead of r for 3rd coordinate.## Compute spherical variables
df.all %<>%
mutate(azimuth = atan2(y, x),
elevation = atan2(z, sqrt(x^2 + y^2)),
vm = sqrt(x^2 + y^2 + z^2))
## Observe range of values
range(df.all$azimuth) ## theoretical range: [-pi, pi]
## [1] -3.139973 3.141593
range(df.all$elevation) ## theoretical range: [-pi/2, pi/2]
## [1] -1.570796 1.558254
range(df.all$vm) ## theoretical range: [0, inf)
## [1] 0.000000 8.735085
Spherical coordinate \(r\) (radius), aka vector magnitude \((vm)\), is often used to reduce the dimensionality of raw accelerometry time-series \((x,y,z)\). Again, it is computed as \(\sqrt{x^2 + y^2 + z^2}\) at each time point resulting in 1- instead of 3-dimensional time-series.
Plot 0-5 seconds of vector magnitude of accelerometry data, across four sensor locations, across various activities.
plt.df <-
df.all %>%
filter(activity_lab != "non_study_activity") %>%
group_by(loc_id, activity_lab) %>%
mutate(time_s_act = row_number(time_s) / 100) %>%
filter(time_s_act >= 0, time_s_act <= 5) %>%
group_by() %>%
mutate(loc_id = factor(loc_id, levels = loc_id.level),
activity_lab = factor(activity_lab, levels = activity_lab.level))
ggplot(plt.df, aes(x = time_s_act, y = vm, group = 1)) +
geom_line() +
facet_grid(loc_id ~ activity_lab) +
labs(y = TeX("Vector magnitude measurement \\[\\textit{g}\\]"), x = "Exercise time [s]")
Plot 0-5 seconds of vector magnitude, azimuth and elevation coordinates of accelerometry data from walking, collected at wrist.
plt.df <-
df.all %>%
filter(activity_lab == "walking",
loc_id == "left_wrist") %>%
group_by(loc_id, activity_lab) %>%
mutate(time_s_act = row_number(time_s) / 100) %>%
filter(time_s_act >= 0, time_s_act <= 10) %>%
select(time_s_act, loc_id, activity_lab, vm, azimuth, elevation) %>%
reshape2::melt(id.vars = c("time_s_act", "loc_id", "activity_lab")) %>%
mutate(loc_id = factor(loc_id, levels = loc_id.level),
activity_lab = factor(activity_lab, levels = activity_lab.level))
ggplot(plt.df, aes(x = time_s_act, y = value, color = variable)) +
geom_line() +
facet_grid(variable ~ activity_lab, scales = "free_y") +
labs(color = "Spherical\ncoordinate", y = TeX("Measurement"), x = "Exercise time [s]",
title = "Walking accelerometry data collected at left wrist")
Example: arm swing movement range, expressed in angle degrees
## Azminuth values range in first 4 seconds
azimuth_vals <-
plt.df %>%
filter(variable == "azimuth", time_s_act < 4) %>%
pull(value)
## Azminuth values range: conversion of radians to degrees
range(azimuth_vals) * 180 / pi
## [1] -135.88140 -30.85325
abs(diff(range(azimuth_vals) * 180 / pi))
## [1] 105.0282
In reality, it is often challenging to make a plot of all data points collected at sampling frequency of 100 Hz even in 50 minutes-long time-series.
A one way to summarize accelerometry data of such high density is to use a \((vmc)\) - vector magnitude count (also known as the mean amplitude deviation). For \(\overline{vm}(t,H)\) - average of \((vm)\) time-series over time window of length \(H\) starting at time \(t\), we define \[\mathrm { vmc } ( t, H ) = \frac { 1 } { H } \sum _ { h = 0 } ^ { H - 1 } \left| vm ( t + h ) - \overline{vm}(t,H) \right|.\]
Compute \(vmc\) for data collected at wrist from whole ~ 50 minutes of data. Use \(H = 3s\) window.
df.sub <-
df.all %>%
filter(loc_id == "left_wrist") %>%
arrange(time_s)
## Function to compute vmc from a vm window vector
vmc <- function(vm.win){
mean(abs(vm.win - mean(vm.win)))
}
## Compute vmc vector in 3-seconds windows
vm <- df.sub$vm
## Vector length of 3-seconds data window given data collection frequency 100 Hz
hz <- 100
win.vl <- hz * 3
rn.seq <- seq(1, to = length(vm), by = win.vl)
vmc.vec <- sapply(rn.seq, function(rn.i){
vm.win.idx <- rn.i : (rn.i + win.vl - 1)
vm.win <- vm[vm.win.idx]
vmc(vm.win)
})
vmc.df <- data.frame(
vmc = vmc.vec,
time_s = df.sub$time_s[rn.seq],
rn_seq = rn.seq,
activity_lab = df.sub$activity_lab[rn.seq]) %>%
filter(activity_lab != "non_study_activity") %>%
mutate(activity_lab = factor(activity_lab, levels = activity_lab.level))
ggplot(vmc.df, aes(x = time_s / 60, y = vmc, color = activity_lab)) +
geom_point(alpha = 0.7) +
labs(x = "Experiment time [s]", y = "Vector magnitude count",
title = "Vector magnitude count (in 3-sec windows) over the experiment time",
color = "Activity\nlabel") +
theme_minimal()
Karas, M., Straczkiewicz, M., Fadel, W., Harezlak, J., Crainiceanu, C., Urbanek, J.K. Adaptive empirical pattern transformation (ADEPT) with application to walking stride segmentation (2019) Biostatistics.
Proposes adaptive empirical pattern transformation (ADEPT) method for precise automatic pattern segmentation from a time-series.
Applied to perform walking strides segmentation from raw accelerometry data of 32 study participants, from data from each of four sensor locations.
For each participant: estimated cadence (number of steps per second) trajectory, stride pattern.
The highest Biostatistics journal reproducibility rank - all manuscript results available on manuscript GitHub repo.
R package adept implements ADEPT method.
Vignette 1: Introduction to adept package
Vignette 2: Walking strides segmentation with adept
Straczkiewicz, M., Urbanek, J.K., Fadel, W.F., Crainiceanu, C.M., Harezlak, J. Automatic car driving detection using raw accelerometry data (2016) Physiological Measurement.
Proposed a Detection Algorithm of Driving via Accelerometry (DADA), designed to detect time periods when an individual is driving a car.
The methodological approach is based on short-time Fourier transform (STFT) applied to the raw accelerometry data and identifies and focuses on frequency vibration ranges that are specific to car driving.
Fadel, W.F., Urbanek, J.K., Albertson, S.R., Li, X., Chomistek, A.K., Harezlak, J. Differentiating Between Walking and Stair Climbing Using Raw Accelerometry Data (2019) Statistics in Biosciences.
Classification method for the detection of walking and its subclasses (level walking, descending stairs, and ascending stairs).
Evaluate the effects of sensor location and tuning parameters on the classification accuracy of the tree models.
Room for exploration: associations between features of accelerometry data and demographics info
Association of features of walking and participants demographics projects.
Prediction/classification projects.
Can we predict age/sex/BMI of participant based on walking data? Based on driving data?
Can we predict walking subtype (level walking, ascending stairs, descending stairs) purely from data? What if demographics are added?
[1]: Fadel, W.F., Urbanek, J.K., Albertson, S.R., Li, X., Chomistek, A.K., Harezlak, J. Differentiating Between Walking and Stair Climbing Using Raw Accelerometry Data (2019) Statistics in Biosciences, 11(2), 334–354. doi: 10.1007/s12561-019-09241-7
[2]: Straczkiewicz, M., Urbanek, J.K., Fadel, W.F., Crainiceanu, C.M., Harezlak, J. Automatic car driving detection using raw accelerometry data (2016) Physiological Measurement, 37(10), 1757–1769.
[3]: Karas, M., Bai, J., Strączkiewicz, M., Harezlak, J., Glynn, N. W., Harris, T., … Urbanek, J. K. (2019). Accelerometry Data in Health Research: Challenges and Opportunities. Review and Examples. (2019). Statistics in Biosciences, 11, 210–237.
[4]: Karas, M., Straczkiewicz, M., Fadel, W., Harezlak, J., Crainiceanu, C., Urbanek, J.K. Adaptive empirical pattern transformation (ADEPT) with application to walking stride segmentation (2019) Biostatistics.